ˆ How to develop a naive LSTM network for a sequence prediction problem.

Size: px
Start display at page:

Download "ˆ How to develop a naive LSTM network for a sequence prediction problem."

Transcription

1 Chapter 27 Understanding Stateful LSTM Recurrent Neural Networks A powerful and popular recurrent neural network is the long short-term model network or LSTM. It is widely used because the architecture overcomes the vanishing and exploding gradient problem that plagues all recurrent neural networks, allowing very large and very deep networks to be created. Like other recurrent neural networks, LSTM networks maintain state, and the specifics of how this is implemented in Keras framework can be confusing. In this lesson you will discover exactly how state is maintained in LSTM networks by the Keras deep learning library. After reading this lesson you will know: ˆ How to develop a naive LSTM network for a sequence prediction problem. ˆ How to carefully manage state through batches and features with an LSTM network. ˆ Hot to manually manage state in an LSTM network for stateful prediction. Let s get started Problem Description: Learn the Alphabet In this tutorial we are going to develop and contrast a number of different LSTM recurrent neural network models. The context of these comparisons will be a simple sequence prediction problem of learning the alphabet. That is, given a letter of the alphabet, predict the next letter of the alphabet. This is a simple sequence prediction problem that once understood can be generalized to other sequence prediction problems like time series prediction and sequence classification. Let s prepare the problem with some Python code that we can reuse from example to example. Firstly, let s import all of the classes and functions we plan to use in this tutorial. import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils Listing 27.1: Import Classes and Functions. 209

2 27.1. Problem Description: Learn the Alphabet 210 Next, we can seed the random number generator to ensure that the results are the same each time the code is executed. # fix random seed for reproducibility numpy.random.seed(7) Listing 27.2: Seed the Random Number Generators. We can now define our dataset, the alphabet. We define the alphabet in uppercase characters for readability. Neural networks model numbers, so we need to map the letters of the alphabet to integer values. We can do this easily by creating a dictionary (map) of the letter index to the character. We can also create a reverse lookup for converting predictions back into characters to be used later. # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) Listing 27.3: Define the Alphabet Dataset. Now we need to create our input and output pairs on which to train our neural network. We can do this by defining an input sequence length, then reading sequences from the input alphabet sequence. For example we use an input length of 1. Starting at the beginning of the raw input data, we can read off the first letter A and the next letter as the prediction B. We move along one character and repeat until we reach a prediction of Z. # prepare the dataset of input to output pairs encoded as integers seq_length = 1 datax = [] datay = [] for i in range(0, len(alphabet) - seq_length, 1): seq_in = alphabet[i:i + seq_length] seq_out = alphabet[i + seq_length] datax.append([char_to_int[char] for char in seq_in]) datay.append(char_to_int[seq_out]) print(seq_in, '->', seq_out) Listing 27.4: Create Patterns from Dataset. We also print out the input pairs for sanity checking. Running the code to this point will produce the following output, summarizing input sequences of length 1 and a single output character. A -> B B -> C C -> D D -> E E -> F F -> G G -> H H -> I I -> J J -> K K -> L

3 27.2. LSTM for Learning One-Char to One-Char Mapping 211 L -> M M -> N N -> O O -> P P -> Q Q -> R R -> S S -> T T -> U U -> V V -> W W -> X X -> Y Y -> Z Listing 27.5: Sample Alphabet Training Patterns. We need to reshape the NumPy array into a format expected by the LSTM networks, that is [samples, time steps, features]. # reshape X to be [samples, time steps, features] X = numpy.reshape(datax, (len(datax), seq_length, 1)) Listing 27.6: Reshape Training Patterns for LSTM Layer. Once reshaped, we can then normalize the input integers to the range 0-to-1, the range of the sigmoid activation functions used by the LSTM network. # normalize X = X / float(len(alphabet)) Listing 27.7: Normalize Training Patterns. Finally, we can think of this problem as a sequence classification task, where each of the 26 letters represents a different class. As such, we can convert the output (y) to a one hot encoding, using the Keras built-in function to categorical(). # one hot encode the output variable y = np_utils.to_categorical(datay) Listing 27.8: One Hot Encode Output Patterns. We are now ready to fit different LSTM models LSTM for Learning One-Char to One-Char Mapping Let s start off by designing a simple LSTM to learn how to predict the next character in the alphabet given the context of just one character. We will frame the problem as a random collection of one-letter input to one-letter output pairs. As we will see this is a difficult framing of the problem for the LSTM to learn. Let s define an LSTM network with 32 units and an output layer using the softmax activation function for making predictions. Because this is a multiclass classification problem, we can use the log loss function (called categorical crossentropy in Keras), and optimize the network using the ADAM optimization function. The model is fit over 500 epochs with a batch size of 1.

4 27.2. LSTM for Learning One-Char to One-Char Mapping 212 # create and fit the model model = Sequential() model.add(lstm(32, input_shape=(x.shape[1], X.shape[2]))) model.add(dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(x, y, epochs=500, batch_size=1, verbose=2) Listing 27.9: Define and Fit LSTM Network Model. After we fit the model we can evaluate and summarize the performance on the entire training dataset. # summarize performance of the model scores = model.evaluate(x, y, verbose=0) print("model Accuracy: %.2f%%" % (scores[1]*100)) Listing 27.10: Evaluate the Fit LSTM Network Model. We can then re-run the training data through the network and generate predictions, converting both the input and output pairs back into their original character format to get a visual idea of how well the network learned the problem. # demonstrate some model predictions for pattern in datax: x = numpy.reshape(pattern, (1, len(pattern), 1)) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] print(seq_in, "->", result) Listing 27.11: Make Predictions Using the Fit LSTM Network. The entire code listing is provided below for completeness. # Naive LSTM to learn one-char to one-char mapping import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers seq_length = 1 datax = [] datay = [] for i in range(0, len(alphabet) - seq_length, 1): seq_in = alphabet[i:i + seq_length] seq_out = alphabet[i + seq_length] datax.append([char_to_int[char] for char in seq_in])

5 27.2. LSTM for Learning One-Char to One-Char Mapping 213 datay.append(char_to_int[seq_out]) print(seq_in, '->', seq_out) # reshape X to be [samples, time steps, features] X = numpy.reshape(datax, (len(datax), seq_length, 1)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(datay) # create and fit the model model = Sequential() model.add(lstm(32, input_shape=(x.shape[1], X.shape[2]))) model.add(dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(x, y, epochs=500, batch_size=1, verbose=2) # summarize performance of the model scores = model.evaluate(x, y, verbose=0) print("model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions for pattern in datax: x = numpy.reshape(pattern, (1, len(pattern), 1)) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] print(seq_in, "->", result) Listing 27.12: LSTM Network for one-char to one-char Mapping. Running this example produces the following output. Model Accuracy: 84.00% ['A'] -> B ['B'] -> C ['C'] -> D ['D'] -> E ['E'] -> F ['F'] -> G ['G'] -> H ['H'] -> I ['I'] -> J ['J'] -> K ['K'] -> L ['L'] -> M ['M'] -> N ['N'] -> O ['O'] -> P ['P'] -> Q ['Q'] -> R ['R'] -> S ['S'] -> T ['T'] -> U ['U'] -> W ['V'] -> Y ['W'] -> Z ['X'] -> Z ['Y'] -> Z

6 27.3. LSTM for a Feature Window to One-Char Mapping 214 Listing 27.13: Output from the one-char to one-char Mapping. We can see that this problem is indeed difficult for the network to learn. The reason is, the poor LSTM units do not have any context to work with. Each input-output pattern is shown to the network in a random order and the state of the network is reset after each pattern (each batch where each batch contains one pattern). This is abuse of the LSTM network architecture, treating it like a standard Multilayer Perceptron. Next, let s try a different framing of the problem in order to provide more sequence to the network from which to learn LSTM for a Feature Window to One-Char Mapping A popular approach to adding more context to data for Multilayer Perceptrons is to use the window method. This is where previous steps in the sequence are provided as additional input features to the network. We can try the same trick to provide more context to the LSTM network. Here, we increase the sequence length from 1 to 3, for example: # prepare the dataset of input to output pairs encoded as integers seq_length = 3 Which creates training patterns like: ABC -> D BCD -> E CDE -> F Listing 27.14: Increase Sequence Length. Listing 27.15: Sample of Longer Input Sequence Length. Each element in the sequence is then provided as a new input feature to the network. This requires a modification of how the input sequences reshaped in the data preparation step: # reshape X to be [samples, time steps, features] X = numpy.reshape(datax, (len(datax), 1, seq_length)) Listing 27.16: Reshape Input so Sequence is Features. It also requires a modification for how the sample patterns are reshaped when demonstrating predictions from the model. x = numpy.reshape(pattern, (1, 1, len(pattern))) Listing 27.17: Reshape Input for Predictions so Sequence is Features. The entire code listing is provided below for completeness. # Naive LSTM to learn three-char window to one-char mapping import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils # fix random seed for reproducibility numpy.random.seed(7)

7 27.3. LSTM for a Feature Window to One-Char Mapping 215 # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers seq_length = 3 datax = [] datay = [] for i in range(0, len(alphabet) - seq_length, 1): seq_in = alphabet[i:i + seq_length] seq_out = alphabet[i + seq_length] datax.append([char_to_int[char] for char in seq_in]) datay.append(char_to_int[seq_out]) print(seq_in, '->', seq_out) # reshape X to be [samples, time steps, features] X = numpy.reshape(datax, (len(datax), 1, seq_length)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(datay) # create and fit the model model = Sequential() model.add(lstm(32, input_shape=(x.shape[1], X.shape[2]))) model.add(dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(x, y, epochs=500, batch_size=1, verbose=2) # summarize performance of the model scores = model.evaluate(x, y, verbose=0) print("model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions for pattern in datax: x = numpy.reshape(pattern, (1, 1, len(pattern))) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] print(seq_in, "->", result) Listing 27.18: LSTM Network for three-char features to one-char Mapping. Running this example provides the following output. Model Accuracy: 86.96% ['A', 'B', 'C'] -> D ['B', 'C', 'D'] -> E ['C', 'D', 'E'] -> F ['D', 'E', 'F'] -> G ['E', 'F', 'G'] -> H ['F', 'G', 'H'] -> I ['G', 'H', 'I'] -> J ['H', 'I', 'J'] -> K ['I', 'J', 'K'] -> L ['J', 'K', 'L'] -> M ['K', 'L', 'M'] -> N ['L', 'M', 'N'] -> O

8 27.4. LSTM for a Time Step Window to One-Char Mapping 216 ['M', 'N', 'O'] -> P ['N', 'O', 'P'] -> Q ['O', 'P', 'Q'] -> R ['P', 'Q', 'R'] -> S ['Q', 'R', 'S'] -> T ['R', 'S', 'T'] -> U ['S', 'T', 'U'] -> V ['T', 'U', 'V'] -> Y ['U', 'V', 'W'] -> Z ['V', 'W', 'X'] -> Z ['W', 'X', 'Y'] -> Z Listing 27.19: Output from the three-char Features to one-char Mapping. We can see a small lift in performance that may or may not be real. This is a simple problem that we were still not able to learn with LSTMs even with the window method. Again, this is a misuse of the LSTM network by a poor framing of the problem. Indeed, the sequences of letters are time steps of one feature rather than one time step of separate features. We have given more context to the network, but not more sequence as it expected. In the next section, we will give more context to the network in the form of time steps LSTM for a Time Step Window to One-Char Mapping In Keras, the intended use of LSTMs is to provide context in the form of time steps, rather than windowed features like with other network types. We can take our first example and simply change the sequence length from 1 to 3. seq_length = 3 Listing 27.20: Increase Sequence Length. Again, this creates input-output pairs that look like: ABC -> D BCD -> E CDE -> F DEF -> G Listing 27.21: Sample of Longer Input Sequence Length. The difference is that the reshaping of the input data takes the sequence as a time step sequence of one feature, rather than a single time step of multiple features. # reshape X to be [samples, time steps, features] X = numpy.reshape(datax, (len(datax), seq_length, 1)) Listing 27.22: Reshape Input so Sequence is Time Steps. This is the intended use of providing sequence context to your LSTM in Keras. The full code example is provided below for completeness. # Naive LSTM to learn three-char time steps to one-char mapping import numpy from keras.models import Sequential

9 27.4. LSTM for a Time Step Window to One-Char Mapping 217 from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers seq_length = 3 datax = [] datay = [] for i in range(0, len(alphabet) - seq_length, 1): seq_in = alphabet[i:i + seq_length] seq_out = alphabet[i + seq_length] datax.append([char_to_int[char] for char in seq_in]) datay.append(char_to_int[seq_out]) print(seq_in, '->', seq_out) # reshape X to be [samples, time steps, features] X = numpy.reshape(datax, (len(datax), seq_length, 1)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(datay) # create and fit the model model = Sequential() model.add(lstm(32, input_shape=(x.shape[1], X.shape[2]))) model.add(dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(x, y, epochs=500, batch_size=1, verbose=2) # summarize performance of the model scores = model.evaluate(x, y, verbose=0) print("model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions for pattern in datax: x = numpy.reshape(pattern, (1, len(pattern), 1)) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] print(seq_in, "->", result) Listing 27.23: LSTM Network for three-char Time Steps to one-char Mapping. Running this example provides the following output. Model Accuracy: % ['A', 'B', 'C'] -> D ['B', 'C', 'D'] -> E ['C', 'D', 'E'] -> F ['D', 'E', 'F'] -> G ['E', 'F', 'G'] -> H ['F', 'G', 'H'] -> I ['G', 'H', 'I'] -> J

10 27.5. LSTM State Maintained Between Samples Within A Batch 218 ['H', 'I', 'J'] -> K ['I', 'J', 'K'] -> L ['J', 'K', 'L'] -> M ['K', 'L', 'M'] -> N ['L', 'M', 'N'] -> O ['M', 'N', 'O'] -> P ['N', 'O', 'P'] -> Q ['O', 'P', 'Q'] -> R ['P', 'Q', 'R'] -> S ['Q', 'R', 'S'] -> T ['R', 'S', 'T'] -> U ['S', 'T', 'U'] -> V ['T', 'U', 'V'] -> W ['U', 'V', 'W'] -> X ['V', 'W', 'X'] -> Y ['W', 'X', 'Y'] -> Z Listing 27.24: Output from the three-char Time Steps to one-char Mapping. We can see that the model learns the problem perfectly as evidenced by the model evaluation and the example predictions. But it has learned a simpler problem. Specifically, it has learned to predict the next letter from a sequence of three letters in the alphabet. It can be shown any random sequence of three letters from the alphabet and predict the next letter. It cannot actually enumerate the alphabet. I expect that a larger enough Multilayer Perceptron network might be able to learn the same mapping using the window method. The LSTM networks are stateful. They should be able to learn the whole alphabet sequence, but by default the Keras implementation resets the network state after each training batch LSTM State Maintained Between Samples Within A Batch The Keras implementation of LSTMs resets the state of the network after each batch. This suggests that if we had a batch size large enough to hold all input patterns and if all the input patterns were ordered sequentially, that the LSTM could use the context of the sequence within the batch to better learn the sequence. We can demonstrate this easily by modifying the first example for learning a one-to-one mapping and increasing the batch size from 1 to the size of the training dataset. Additionally, Keras shuffles the training dataset before each training epoch. To ensure the training data patterns remain sequential, we can disable this shuffling. model.fit(x, y, epochs=5000, batch_size=len(datax), verbose=2, shuffle=false) Listing 27.25: Increase Batch Size to Cover Entire Dataset. The network will learn the mapping of characters using the within-batch sequence, but this context will not be available to the network when making predictions. We can evaluate both the ability of the network to make predictions randomly and in sequence. The full code example is provided below for completeness. # Naive LSTM to learn one-char to one-char mapping with all data in each batch import numpy from keras.models import Sequential from keras.layers import Dense

11 27.5. LSTM State Maintained Between Samples Within A Batch 219 from keras.layers import LSTM from keras.utils import np_utils from keras.preprocessing.sequence import pad_sequences # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers seq_length = 1 datax = [] datay = [] for i in range(0, len(alphabet) - seq_length, 1): seq_in = alphabet[i:i + seq_length] seq_out = alphabet[i + seq_length] datax.append([char_to_int[char] for char in seq_in]) datay.append(char_to_int[seq_out]) print(seq_in, '->', seq_out) # convert list of lists to array and pad sequences if needed X = pad_sequences(datax, maxlen=seq_length, dtype='float32') # reshape X to be [samples, time steps, features] X = numpy.reshape(datax, (X.shape[0], seq_length, 1)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(datay) # create and fit the model model = Sequential() model.add(lstm(16, input_shape=(x.shape[1], X.shape[2]))) model.add(dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(x, y, epochs=5000, batch_size=len(datax), verbose=2, shuffle=false) # summarize performance of the model scores = model.evaluate(x, y, verbose=0) print("model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions for pattern in datax: x = numpy.reshape(pattern, (1, len(pattern), 1)) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] print(seq_in, "->", result) # demonstrate predicting random patterns print("test a Random Pattern:") for i in range(0,20): pattern_index = numpy.random.randint(len(datax)) pattern = datax[pattern_index] x = numpy.reshape(pattern, (1, len(pattern), 1)) result = int_to_char[index]

12 27.5. LSTM State Maintained Between Samples Within A Batch 220 seq_in = [int_to_char[value] for value in pattern] print(seq_in, "->", result) Listing 27.26: LSTM Network for one-char to one-char Mapping Within Batch. Running the example provides the following output. Model Accuracy: % ['A'] -> B ['B'] -> C ['C'] -> D ['D'] -> E ['E'] -> F ['F'] -> G ['G'] -> H ['H'] -> I ['I'] -> J ['J'] -> K ['K'] -> L ['L'] -> M ['M'] -> N ['N'] -> O ['O'] -> P ['P'] -> Q ['Q'] -> R ['R'] -> S ['S'] -> T ['T'] -> U ['U'] -> V ['V'] -> W ['W'] -> X ['X'] -> Y ['Y'] -> Z Test a Random Pattern: ['T'] -> U ['V'] -> W ['M'] -> N ['Q'] -> R ['D'] -> E ['V'] -> W ['T'] -> U ['U'] -> V ['J'] -> K ['F'] -> G ['N'] -> O ['B'] -> C ['M'] -> N ['F'] -> G ['F'] -> G ['P'] -> Q ['A'] -> B ['K'] -> L ['W'] -> X ['E'] -> F Listing 27.27: Output from the one-char to one-char Mapping Within Batch.

13 27.6. Stateful LSTM for a One-Char to One-Char Mapping 221 As we expected, the network is able to use the within-sequence context to learn the alphabet, achieving 100% accuracy on the training data. Importantly, the network can make accurate predictions for the next letter in the alphabet for randomly selected characters. Very impressive Stateful LSTM for a One-Char to One-Char Mapping We have seen that we can break-up our raw data into fixed size sequences and that this representation can be learned by the LSTM, but only to learn random mappings of 3 characters to 1 character. We have also seen that we can pervert batch size to offer more sequence to the network, but only during training. Ideally, we want to expose the network to the entire sequence and let it learn the inter-dependencies, rather than us define those dependencies explicitly in the framing of the problem. We can do this in Keras by making the LSTM layers stateful and manually resetting the state of the network at the end of the epoch, which is also the end of the training sequence. This is truly how the LSTM networks are intended to be used. We find that by allowing the network itself to learn the dependencies between the characters, that we need a smaller network (half the number of units) and fewer training epochs (almost half). We first need to define our LSTM layer as stateful. In so doing, we must explicitly specify the batch size as a dimension on the input shape. This also means that when we evaluate the network or make predictions, we must also specify and adhere to this same batch size. This is not a problem now as we are using a batch size of 1. This could introduce difficulties when making predictions when the batch size is not one as predictions will need to be made in batch and in sequence. batch_size = 1 model.add(lstm(16, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=true)) Listing 27.28: Define a Stateful LSTM Layer. An important difference in training the stateful LSTM is that we train it manually one epoch at a time and reset the state after each epoch. We can do this in a for loop. Again, we do not shuffle the input, preserving the sequence in which the input training data was created. for i in range(300): model.fit(x, y, epochs=1, batch_size=batch_size, verbose=2, shuffle=false) model.reset_states() Listing 27.29: Manually Manage LSTM State For Each Epoch. As mentioned, we specify the batch size when evaluating the performance of the network on the entire training dataset. # summarize performance of the model scores = model.evaluate(x, y, batch_size=batch_size, verbose=0) model.reset_states() print("model Accuracy: %.2f%%" % (scores[1]*100)) Listing 27.30: Evaluate Model Using Pre-defined Batch Size. Finally, we can demonstrate that the network has indeed learned the entire alphabet. We can seed it with the first letter A, request a prediction, feed the prediction back in as an input, and repeat the process all the way to Z.

14 27.6. Stateful LSTM for a One-Char to One-Char Mapping 222 # demonstrate some model predictions seed = [char_to_int[alphabet[0]]] for i in range(0, len(alphabet)-1): x = numpy.reshape(seed, (1, len(seed), 1)) print(int_to_char[seed[0]], "->", int_to_char[index]) seed = [index] model.reset_states() Listing 27.31: Seed Network and Make Predictions from A to Z. We can also see if the network can make predictions starting from an arbitrary letter. # demonstrate a random starting point letter = "K" seed = [char_to_int[letter]] print("new start: ", letter) for i in range(0, 5): x = numpy.reshape(seed, (1, len(seed), 1)) print(int_to_char[seed[0]], "->", int_to_char[index]) seed = [index] model.reset_states() Listing 27.32: Seed Network with a Random Letter and a Sequence of Predictions. The entire code listing is provided below for completeness. # Stateful LSTM to learn one-char to one-char mapping import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers seq_length = 1 datax = [] datay = [] for i in range(0, len(alphabet) - seq_length, 1): seq_in = alphabet[i:i + seq_length] seq_out = alphabet[i + seq_length] datax.append([char_to_int[char] for char in seq_in]) datay.append(char_to_int[seq_out]) print(seq_in, '->', seq_out) # reshape X to be [samples, time steps, features] X = numpy.reshape(datax, (len(datax), seq_length, 1))

15 27.6. Stateful LSTM for a One-Char to One-Char Mapping 223 # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(datay) # create and fit the model batch_size = 1 model = Sequential() model.add(lstm(16, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=true)) model.add(dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) for i in range(300): model.fit(x, y, epochs=1, batch_size=batch_size, verbose=2, shuffle=false) model.reset_states() # summarize performance of the model scores = model.evaluate(x, y, batch_size=batch_size, verbose=0) model.reset_states() print("model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions seed = [char_to_int[alphabet[0]]] for i in range(0, len(alphabet)-1): x = numpy.reshape(seed, (1, len(seed), 1)) print(int_to_char[seed[0]], "->", int_to_char[index]) seed = [index] model.reset_states() # demonstrate a random starting point letter = "K" seed = [char_to_int[letter]] print("new start: ", letter) for i in range(0, 5): x = numpy.reshape(seed, (1, len(seed), 1)) print(int_to_char[seed[0]], "->", int_to_char[index]) seed = [index] model.reset_states() Listing 27.33: Stateful LSTM Network for one-char to one-char Mapping. Running the example provides the following output. Model Accuracy: % A -> B B -> C C -> D D -> E E -> F F -> G G -> H H -> I I -> J J -> K K -> L L -> M

16 27.7. LSTM with Variable Length Input to One-Char Output 224 M -> N N -> O O -> P P -> Q Q -> R R -> S S -> T T -> U U -> V V -> W W -> X X -> Y Y -> Z New start: K K -> B B -> C C -> D D -> E E -> F Listing 27.34: Output from the Stateful LSTM for one-char to one-char Mapping. We can see that the network has memorized the entire alphabet perfectly. It used the context of the samples themselves and learned whatever dependency it needed to predict the next character in the sequence. We can also see that if we seed the network with the first letter, that it can correctly rattle off the rest of the alphabet. We can also see that it has only learned the full alphabet sequence and that from a cold start. When asked to predict the next letter from K that it predicts B and falls back into regurgitating the entire alphabet. To truly predict K, the state of the network would need to be warmed up iteratively fed the letters from A to J. This tells us that we could achieve the same effect with a stateless LSTM by preparing training data like: ---a -> b --ab -> c -abc -> d abcd -> e Listing 27.35: Sample of Equivalent Training Data for Non-Stateful LSTM Layers. Where the input sequence is fixed at 25 (a-to-y to predict z) and patterns are prefixed with zero-padding. Finally, this raises the question of training an LSTM network using variable length input sequences to predict the next character LSTM with Variable Length Input to One-Char Output In the previous section we discovered that the Keras stateful LSTM was really only a short cut to replaying the first n-sequences, but didn t really help us learn a generic model of the alphabet. In this section we explore a variation of the stateless LSTM that learns random subsequences of the alphabet and an effort to build a model that can be given arbitrary letters or subsequences of letters and predict the next letter in the alphabet.

17 27.7. LSTM with Variable Length Input to One-Char Output 225 Firstly, we are changing the framing of the problem. To simplify we will define a maximum input sequence length and set it to a small value like 5 to speed up training. This defines the maximum length of subsequences of the alphabet which will be drawn for training. In extensions, this could just be set to the full alphabet (26) or longer if we allow looping back to the start of the sequence. We also need to define the number of random sequences to create, in this case, 1,000. This too could be more or less. I expect fewer patterns are actually required. # prepare the dataset of input to output pairs encoded as integers num_inputs = 1000 max_len = 5 datax = [] datay = [] for i in range(num_inputs): start = numpy.random.randint(len(alphabet)-2) end = numpy.random.randint(start, min(start+max_len,len(alphabet)-1)) sequence_in = alphabet[start:end+1] sequence_out = alphabet[end + 1] datax.append([char_to_int[char] for char in sequence_in]) datay.append(char_to_int[sequence_out]) print(sequence_in, '->', sequence_out) Listing 27.36: Create Dataset of Variable Length Input Sequences. Running this code in the broader context will create input patterns that look like the following: PQRST -> U W -> X O -> P OPQ -> R IJKLM -> N QRSTU -> V ABCD -> E X -> Y GHIJ -> K Listing 27.37: Sample of Variable Length Input Sequences. The input sequences vary in length between 1 and max len and therefore require zero padding. Here, we use left-hand-side (prefix) padding with the Keras built in pad sequences() function. X = pad_sequences(datax, maxlen=max_len, dtype='float32') Listing 27.38: Left-Pad Variable Length Input Sequences. The trained model is evaluated on randomly selected input patterns. This could just as easily be new randomly generated sequences of characters. I also believe this could also be a linear sequence seeded with A with outputs fed back in as single character inputs. The full code listing is provided below for completeness. # LSTM with Variable Length Input Sequences to One Character Output import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils from keras.preprocessing.sequence import pad_sequences

18 27.7. LSTM with Variable Length Input to One-Char Output 226 # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers num_inputs = 1000 max_len = 5 datax = [] datay = [] for i in range(num_inputs): start = numpy.random.randint(len(alphabet)-2) end = numpy.random.randint(start, min(start+max_len,len(alphabet)-1)) sequence_in = alphabet[start:end+1] sequence_out = alphabet[end + 1] datax.append([char_to_int[char] for char in sequence_in]) datay.append(char_to_int[sequence_out]) print(sequence_in, '->', sequence_out) # convert list of lists to array and pad sequences if needed X = pad_sequences(datax, maxlen=max_len, dtype='float32') # reshape X to be [samples, time steps, features] X = numpy.reshape(x, (X.shape[0], max_len, 1)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(datay) # create and fit the model batch_size = 1 model = Sequential() model.add(lstm(32, input_shape=(x.shape[1], 1))) model.add(dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(x, y, epochs=500, batch_size=batch_size, verbose=2) # summarize performance of the model scores = model.evaluate(x, y, verbose=0) print("model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions for i in range(20): pattern_index = numpy.random.randint(len(datax)) pattern = datax[pattern_index] x = pad_sequences([pattern], maxlen=max_len, dtype='float32') x = numpy.reshape(x, (1, max_len, 1)) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] print(seq_in, "->", result) Listing 27.39: LSTM Network for Variable Length Sequences to one-char Mapping. Running this code produces the following output: Model Accuracy: 98.90% ['Q', 'R'] -> S

19 27.8. Summary 227 ['W', 'X'] -> Y ['W', 'X'] -> Y ['C', 'D'] -> E ['E'] -> F ['S', 'T', 'U'] -> V ['G', 'H', 'I', 'J', 'K'] -> L ['O', 'P', 'Q', 'R', 'S'] -> T ['C', 'D'] -> E ['O'] -> P ['N', 'O', 'P'] -> Q ['D', 'E', 'F', 'G', 'H'] -> I ['X'] -> Y ['K'] -> L ['M'] -> N ['R'] -> T ['K'] -> L ['E', 'F', 'G'] -> H ['Q'] -> R ['Q', 'R', 'S'] -> T Listing 27.40: Output for the LSTM Network for Variable Length Sequences to one-char Mapping. We can see that although the model did not learn the alphabet perfectly from the randomly generated subsequences, it did very well. The model was not tuned and may require more training or a larger network, or both (an exercise for the reader). This is a good natural extension to the all sequential input examples in each batch alphabet model learned above in that it can handle ad hoc queries, but this time of arbitrary sequence length (up to the max length) Summary In this lesson you discovered LSTM recurrent neural networks in Keras and how they manage state. Specifically, you learned: ˆ How to develop a naive LSTM network for one-character to one-character prediction. ˆ How to configure a naive LSTM to learn a sequence across time steps within a sample. ˆ How to configure an LSTM to learn a sequence across samples by manually managing state Next In this lesson you developed your understanding for how LSTM networks maintain state for simple sequence prediction problems. Up next you will use your understanding of LSTM networks to develop larger text generation models.

How to Develop Encoder-Decoder LSTMs

How to Develop Encoder-Decoder LSTMs Chapter 9 How to Develop Encoder-Decoder LSTMs 9.0.1 Lesson Goal The goal of this lesson is to learn how to develop encoder-decoder LSTM models. completing this lesson, you will know: After ˆ The Encoder-Decoder

More information

ˆ Why language modeling is critical to addressing tasks in natural language processing.

ˆ Why language modeling is critical to addressing tasks in natural language processing. Chapter 17 Neural Language Modeling Language modeling is central to many important natural language processing tasks. Recently, neural-network-based language models have demonstrated better performance

More information

A Quick Guide on Training a neural network using Keras.

A Quick Guide on Training a neural network using Keras. A Quick Guide on Training a neural network using Keras. TensorFlow and Keras Keras Open source High level, less flexible Easy to learn Perfect for quick implementations Starts by François Chollet from

More information

Deep Nets with. Keras

Deep Nets with. Keras docs https://keras.io Deep Nets with Keras κέρας http://vem.quantumunlimited.org/the-gates-of-horn/ Professor Marie Roch These slides only cover enough to get started with feed-forward networks and do

More information

DEEP LEARNING IN PYTHON. Creating a keras model

DEEP LEARNING IN PYTHON. Creating a keras model DEEP LEARNING IN PYTHON Creating a keras model Model building steps Specify Architecture Compile Fit Predict Model specification In [1]: import numpy as np In [2]: from keras.layers import Dense In [3]:

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Technical University of Munich. Exercise 8: Neural Networks

Technical University of Munich. Exercise 8: Neural Networks Technical University of Munich Chair for Bioinformatics and Computational Biology Protein Prediction I for Computer Scientists SoSe 2018 Prof. B. Rost M. Bernhofer, M. Heinzinger, D. Nechaev, L. Richter

More information

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment

More information

Practical Deep Learning

Practical Deep Learning Practical Deep Learning micha.codes / fastforwardlabs.com 1 / 70 deep learning can seem mysterious 2 / 70 let's nd a way to just build a function 3 / 70 Feed Forward Layer # X.shape == (512,) # output.shape

More information

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30

More information

Keras: Handwritten Digit Recognition using MNIST Dataset

Keras: Handwritten Digit Recognition using MNIST Dataset Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA February 9, 2017 1 / 24 OUTLINE 1 Introduction Keras: Deep Learning library for Theano and TensorFlow 2 Installing Keras Installation

More information

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44

Encoding RNNs, 48 End of sentence (EOS) token, 207 Exploding gradient, 131 Exponential function, 42 Exponential Linear Unit (ELU), 44 A Activation potential, 40 Annotated corpus add padding, 162 check versions, 158 create checkpoints, 164, 166 create input, 160 create train and validation datasets, 163 dropout, 163 DRUG-AE.rel file,

More information

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Abstract: This project aims at creating a benchmark for Deep Learning (DL) algorithms

More information

End-To-End Spam Classification With Neural Networks

End-To-End Spam Classification With Neural Networks End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam

More information

MoonRiver: Deep Neural Network in C++

MoonRiver: Deep Neural Network in C++ MoonRiver: Deep Neural Network in C++ Chung-Yi Weng Computer Science & Engineering University of Washington chungyi@cs.washington.edu Abstract Artificial intelligence resurges with its dramatic improvement

More information

House Price Prediction Using LSTM

House Price Prediction Using LSTM House Price Prediction Using LSTM Xiaochen Chen Lai Wei The Hong Kong University of Science and Technology Jiaxin Xu ABSTRACT In this paper, we use the house price data ranging from January 2004 to October

More information

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework Demo Paper Joerg Evermann 1, Jana-Rebecca Rehse 2,3, and Peter Fettke 2,3 1 Memorial University of Newfoundland 2 German Research

More information

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today

More information

Frameworks in Python for Numeric Computation / ML

Frameworks in Python for Numeric Computation / ML Frameworks in Python for Numeric Computation / ML Why use a framework? Why not use the built-in data structures? Why not write our own matrix multiplication function? Frameworks are needed not only because

More information

PLT: Inception (cuz there are so many layers)

PLT: Inception (cuz there are so many layers) PLT: Inception (cuz there are so many layers) By: Andrew Aday, (aza2112) Amol Kapoor (ajk2227), Jonathan Zhang (jz2814) Proposal Abstract Overview of domain Purpose Language Outline Types Operators Syntax

More information

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional

More information

Package kerasformula

Package kerasformula Package kerasformula August 23, 2018 Type Package Title A High-Level R Interface for Neural Nets Version 1.5.1 Author Pete Mohanty [aut, cre] Maintainer Pete Mohanty Description

More information

Deep Learning in NLP. Horacio Rodríguez. AHLT Deep Learning 2 1

Deep Learning in NLP. Horacio Rodríguez. AHLT Deep Learning 2 1 Deep Learning in NLP Horacio Rodríguez AHLT Deep Learning 2 1 Outline Introduction Short review of Distributional Semantics, Semantic spaces, VSM, Embeddings Embedding of words Embedding of more complex

More information

Machine Learning for Physicists Lecture 6. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt

Machine Learning for Physicists Lecture 6. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt Machine Learning for Physicists Lecture 6 Summer 2017 University of Erlangen-Nuremberg Florian Marquardt Channels MxM image MxM image K K 3 channels conv 6 channels in any output channel, each pixel receives

More information

An Introduction to NNs using Keras

An Introduction to NNs using Keras An Introduction to NNs using Keras Michela Paganini michela.paganini@cern.ch Yale University 1 Keras Modular, powerful and intuitive Deep Learning python library built on Theano and TensorFlow Minimalist,

More information

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016 CS 4510/9010 Applied Machine Learning 1 Neural Nets Paula Matuszek Fall 2016 Neural Nets, the very short version 2 A neural net consists of layers of nodes, or neurons, each of which has an activation

More information

Sentiment Classification of Food Reviews

Sentiment Classification of Food Reviews Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,

More information

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application

More information

ML/DL for Everyone with

ML/DL for Everyone with Sung Kim HKUST Code: https://github.com/hunkim/pytorchzerotoall Slides: http://bit.ly/pytorchzeroall ML/DL for Everyone with Lecture 8: DataLoader Call for Comments Please feel free

More information

ˆ The first architecture to try with specific advice on how to configure hyperparameters.

ˆ The first architecture to try with specific advice on how to configure hyperparameters. Chapter 14 Neural Models for Document Classification Text classification describes a general class of problems such as predicting the sentiment of tweets and movie reviews, as well as classifying email

More information

ECE 5470 Classification, Machine Learning, and Neural Network Review

ECE 5470 Classification, Machine Learning, and Neural Network Review ECE 5470 Classification, Machine Learning, and Neural Network Review Due December 1. Solution set Instructions: These questions are to be answered on this document which should be submitted to blackboard

More information

Recurrent Neural Network (RNN) Industrial AI Lab.

Recurrent Neural Network (RNN) Industrial AI Lab. Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series

More information

Bayesian model ensembling using meta-trained recurrent neural networks

Bayesian model ensembling using meta-trained recurrent neural networks Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya

More information

DEEP LEARNING IN PYTHON. Introduction to deep learning

DEEP LEARNING IN PYTHON. Introduction to deep learning DEEP LEARNING IN PYTHON Introduction to deep learning Imagine you work for a bank You need to predict how many transactions each customer will make next year Example as seen by linear regression Age Bank

More information

Neural networks. About. Linear function approximation. Spyros Samothrakis Research Fellow, IADS University of Essex.

Neural networks. About. Linear function approximation. Spyros Samothrakis Research Fellow, IADS University of Essex. Neural networks Spyros Samothrakis Research Fellow, IADS University of Essex About Linear function approximation with SGD From linear regression to neural networks Practical aspects February 28, 2017 Conclusion

More information

Application of Deep Learning Techniques in Satellite Telemetry Analysis.

Application of Deep Learning Techniques in Satellite Telemetry Analysis. Application of Deep Learning Techniques in Satellite Telemetry Analysis. Greg Adamski, Member of Technical Staff L3 Technologies Telemetry and RF Products Julian Spencer Jones, Spacecraft Engineer Telenor

More information

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units COMP9 8s Geometry of Hidden Units COMP9 Neural Networks and Deep Learning 5. Geometry of Hidden Units Outline Geometry of Hidden Unit Activations Limitations of -layer networks Alternative transfer functions

More information

The Bizarre Truth! Automating the Automation. Complicated & Confusing taxonomy of Model Based Testing approach A CONFORMIQ WHITEPAPER

The Bizarre Truth! Automating the Automation. Complicated & Confusing taxonomy of Model Based Testing approach A CONFORMIQ WHITEPAPER The Bizarre Truth! Complicated & Confusing taxonomy of Model Based Testing approach A CONFORMIQ WHITEPAPER By Kimmo Nupponen 1 TABLE OF CONTENTS 1. The context Introduction 2. The approach Know the difference

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training

More information

Supervised Learning in Neural Networks (Part 2)

Supervised Learning in Neural Networks (Part 2) Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning

More information

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015 Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using

More information

CS229 Final Project: Predicting Expected Response Times

CS229 Final Project: Predicting Expected  Response Times CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time

More information

Sentiment Analysis for Amazon Reviews

Sentiment Analysis for Amazon Reviews Sentiment Analysis for Amazon Reviews Wanliang Tan wanliang@stanford.edu Xinyu Wang xwang7@stanford.edu Xinyu Xu xinyu17@stanford.edu Abstract Sentiment analysis of product reviews, an application problem,

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

More information

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry Jincheng Cao, SCPD Jincheng@stanford.edu 1. INTRODUCTION When running a direct mail campaign, it s common practice

More information

CS 224n: Assignment #3

CS 224n: Assignment #3 CS 224n: Assignment #3 Due date: 2/27 11:59 PM PST (You are allowed to use 3 late days maximum for this assignment) These questions require thought, but do not require long answers. Please be as concise

More information

1 Strings (Review) CS151: Problem Solving and Programming

1 Strings (Review) CS151: Problem Solving and Programming 1 Strings (Review) Strings are a collection of characters. quotes. this is a string "this is also a string" In python, strings can be delineated by either single or double If you use one type of quote

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

keras-pandas Documentation

keras-pandas Documentation keras-pandas Documentation Release 1.2.0 Brendan Herger Sep 29, 2018 Contents: 1 keras-pandas 3 1.1 Quick Start................................................ 3 1.2 Usage...................................................

More information

Day 1 Lecture 6. Software Frameworks for Deep Learning

Day 1 Lecture 6. Software Frameworks for Deep Learning Day 1 Lecture 6 Software Frameworks for Deep Learning Packages Caffe Theano NVIDIA Digits Lasagne Keras Blocks Torch TensorFlow MxNet MatConvNet Nervana Neon Leaf Caffe Deep learning framework from Berkeley

More information

Notes on Multilayer, Feedforward Neural Networks

Notes on Multilayer, Feedforward Neural Networks Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book

More information

Predict the box office of US movies

Predict the box office of US movies Predict the box office of US movies Group members: Hanqing Ma, Jin Sun, Zeyu Zhang 1. Introduction Our task is to predict the box office of the upcoming movies using the properties of the movies, such

More information

Emel: Deep Learning in One Line using Type Inference, Architecture Selection, and Hyperparameter Tuning

Emel: Deep Learning in One Line using Type Inference, Architecture Selection, and Hyperparameter Tuning Emel: Deep Learning in One Line using Type Inference, Architecture Selection, and Hyperparameter Tuning BARAK OSHRI and NISHITH KHANDWALA We present Emel, a new framework for training baseline supervised

More information

Memory Addressing, Binary, and Hexadecimal Review

Memory Addressing, Binary, and Hexadecimal Review C++ By A EXAMPLE Memory Addressing, Binary, and Hexadecimal Review You do not have to understand the concepts in this appendix to become well-versed in C++. You can master C++, however, only if you spend

More information

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company

More information

Deep neural networks II

Deep neural networks II Deep neural networks II May 31 st, 2018 Yong Jae Lee UC Davis Many slides from Rob Fergus, Svetlana Lazebnik, Jia-Bin Huang, Derek Hoiem, Adriana Kovashka, Why (convolutional) neural networks? State of

More information

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with

More information

An Introduction to Deep Learning with RapidMiner. Philipp Schlunder - RapidMiner Research

An Introduction to Deep Learning with RapidMiner. Philipp Schlunder - RapidMiner Research An Introduction to Deep Learning with RapidMiner Philipp Schlunder - RapidMiner Research What s in store for today?. Things to know before getting started 2. What s Deep Learning anyway? 3. How to use

More information

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Empirical Evaluation of RNN Architectures on Sentence Classification Task Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks

More information

Recurrent Convolutional Neural Networks for Scene Labeling

Recurrent Convolutional Neural Networks for Scene Labeling Recurrent Convolutional Neural Networks for Scene Labeling Pedro O. Pinheiro, Ronan Collobert Reviewed by Yizhe Zhang August 14, 2015 Scene labeling task Scene labeling: assign a class label to each pixel

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

Apparel Classification using CNNs

Apparel Classification using CNNs Apparel Classification using CNNs Rohit Patki ICME Stanford University rpatki@stanford.edu Suhas Suresha ICME Stanford University suhas17@stanford.edu Abstract Apparel classification from images finds

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

Package automl. September 13, 2018

Package automl. September 13, 2018 Type Package Title Deep Learning with Metaheuristic Version 1.0.5 Author Alex Boulangé Package automl September 13, 2018 Maintainer Alex Boulangé Fits from

More information

Homework 01 : Deep learning Tutorial

Homework 01 : Deep learning Tutorial Homework 01 : Deep learning Tutorial Introduction to TensorFlow and MLP 1. Introduction You are going to install TensorFlow as a tutorial of deep learning implementation. This instruction will provide

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu Artificial Neural Networks Introduction to Computational Neuroscience Ardi Tampuu 7.0.206 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Neural Networks (pp )

Neural Networks (pp ) Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model

More information

CS224n: Natural Language Processing with Deep Learning 1

CS224n: Natural Language Processing with Deep Learning 1 CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: TensorFlow 2 Winter 2017 1 Course Instructors: Christopher Manning, Richard Socher 2 Authors: Zhedi Liu, Jon Gauthier, Bharath Ramsundar,

More information

Heads Up! (Continued)

Heads Up! (Continued) . Name Date A c t i v i t y 6 Heads Up! (Continued) In this activity, you will do more experiments with simulations and use a calculator program that will quickly simulate multiple coin tosses. The Problem

More information

Deep Learning Cook Book

Deep Learning Cook Book Deep Learning Cook Book Robert Haschke (CITEC) Overview Input Representation Output Layer + Cost Function Hidden Layer Units Initialization Regularization Input representation Choose an input representation

More information

Deep Learning Frameworks. COSC 7336: Advanced Natural Language Processing Fall 2017

Deep Learning Frameworks. COSC 7336: Advanced Natural Language Processing Fall 2017 Deep Learning Frameworks COSC 7336: Advanced Natural Language Processing Fall 2017 Today s lecture Deep learning software overview TensorFlow Keras Practical Graphical Processing Unit (GPU) From graphical

More information

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning  Ian Goodfellow Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains

More information

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning, A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,

More information

Lecture 12 Notes Hash Tables

Lecture 12 Notes Hash Tables Lecture 12 Notes Hash Tables 15-122: Principles of Imperative Computation (Spring 2016) Frank Pfenning, Rob Simmons 1 Introduction In this lecture we re-introduce the dictionaries that were implemented

More information

3. Conditional Execution

3. Conditional Execution 3. Conditional Execution Topics: Boolean values Relational operators if statements The Boolean type Motivation Problem: Assign positive float values to variables a and b and print the values a**b and b**a.

More information

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3 Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.

More information

Lab 8 CSC 5930/ Computer Vision

Lab 8 CSC 5930/ Computer Vision Lab 8 CSC 5930/9010 - Computer Vision Description: One way to effectively train a neural network with multiple layers is by training one layer at a time. You can achieve this by training a special type

More information

Lecture 12 Hash Tables

Lecture 12 Hash Tables Lecture 12 Hash Tables 15-122: Principles of Imperative Computation (Spring 2018) Frank Pfenning, Rob Simmons Dictionaries, also called associative arrays as well as maps, are data structures that are

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in

More information

Visual object classification by sparse convolutional neural networks

Visual object classification by sparse convolutional neural networks Visual object classification by sparse convolutional neural networks Alexander Gepperth 1 1- Ruhr-Universität Bochum - Institute for Neural Dynamics Universitätsstraße 150, 44801 Bochum - Germany Abstract.

More information

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan Lecture 17: Neural Networks and Deep Learning Instructor: Saravanan Thirumuruganathan Outline Perceptron Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks Auto Encoders

More information

Tensorflow Example: Fizzbuzz. Sargur N. Srihari

Tensorflow Example: Fizzbuzz. Sargur N. Srihari Tensorflow Example: Fizzbuzz Sargur N. srihari@cedar.buffalo.edu 1 Fizzbuzz in Tensorflow Fizzbuzz: Print the numbers from 1 to 100, except that if the number is divisible by 3 print "fizz", if it's divisible

More information

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Deep Character-Level Click-Through Rate Prediction for Sponsored Search Deep Character-Level Click-Through Rate Prediction for Sponsored Search Bora Edizel - Phd Student UPF Amin Mantrach - Criteo Research Xiao Bai - Oath This work was done at Yahoo and will be presented as

More information

arxiv: v1 [cs.cr] 4 Apr 2017

arxiv: v1 [cs.cr] 4 Apr 2017 Using Echo State Networks for Cryptography R. Ramamurthy, C. Bauckhage, K. Buza, and S. Wrobel Department of Computer Science, University of Bonn, Bonn, Germany arxiv:174.146v1 [cs.cr] 4 Apr 217 Abstract.

More information

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs Natural Language Processing with Deep Learning CS4N/Ling84 Christopher Manning Lecture 4: Backpropagation and computation graphs Lecture Plan Lecture 4: Backpropagation and computation graphs 1. Matrix

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius

MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius What is Mixed Precision Training? Reduced precision tensor math with FP32 accumulation, FP16 storage Successfully used to train a variety

More information

CS 4510/9010 Applied Machine Learning

CS 4510/9010 Applied Machine Learning CS 4510/9010 Applied Machine Learning Neural Nets Paula Matuszek Spring, 2015 1 Neural Nets, the very short version A neural net consists of layers of nodes, or neurons, each of which has an activation

More information

arxiv: v1 [stat.ml] 21 Feb 2018

arxiv: v1 [stat.ml] 21 Feb 2018 Detecting Learning vs Memorization in Deep Neural Networks using Shared Structure Validation Sets arxiv:2.0774v [stat.ml] 2 Feb 8 Elias Chaibub Neto e-mail: elias.chaibub.neto@sagebase.org, Sage Bionetworks

More information

1 Topic. Image classification using Knime.

1 Topic. Image classification using Knime. 1 Topic Image classification using Knime. The aim of image mining is to extract valuable knowledge from image data. In the context of supervised image classification, we want to assign automatically a

More information

Tutorial on Machine Learning Tools

Tutorial on Machine Learning Tools Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow

More information

1 Achieving IND-CPA security

1 Achieving IND-CPA security ISA 562: Information Security, Theory and Practice Lecture 2 1 Achieving IND-CPA security 1.1 Pseudorandom numbers, and stateful encryption As we saw last time, the OTP is perfectly secure, but it forces

More information